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An efficient and robust linear scaling method is presented for large scale ab initio electronic 
structure calculations of a wide variety of materials including metals. The detailed short range and 
the effective long range contributions to the electronic structure are taken into account by solving 
an embedded cluster defined in a Krylov subspace, which provides rapid convergence for not only 
insulators but also metals. As an illustration of the method, we present a large scale calculation 
based on density functional theory for a palladium cluster with a single iron impurity. 
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The development of linear scaling method is a promis- 
ing direction of extending applicability of ab initio elec- 
tronic structure calculations based on density functional 
theory (DFT) to large scale realistic systems [l], Q- In 
fact, over the last decade, considerable efforts have been 
devoted to establish efficient and robust linear scaling 
methods HHHI3, and successful applications have been 
reported within non-SCF tight binding (TB) scheme^. 
Nevertheless, within the fully self consistent field (SCF) 
DFT much improvement is still needed to reduce the er- 
ror below chemical accuracy (a few milli-Hartree/atom), 
referred to as milli-Hartree accuracy hereafter, for a wide 
variety of materials with modest computational cost. In 
addition, the application to ab initio calculations is prac- 
tically hampered by an intractable feature that an ap- 
proximate solution of eigenstates by the linear scaling 
methods often induces instabilities in the SCF calcu- 
lation, fn this paper to overcome these difficulties we 
present an efficient and robust linear scaling method for 
a wide variety of materials including metals in which 
ideas behind two linear scaling methods, divide-conquer 
(DC) 0| and recursion methods^Q, are unified in a sin- 
gle framework. It is known that the DC method pro- 
vides rapid convergence for covalent systems such as bi- 
ological molecules with numerical stability during the 
SCF calculation^. However, the application of the DC 
method to metals is significantly restricted by require- 
ment of the large size of truncated cluster. On the 
other hand, the recursion method based on Lanczos al- 
gorithms and Green's functions is one of suitable meth- 
ods for metals 0,0, although the SCF calculation with 
the recursion method becomes unstable. The main idea 
behind the recursion method is to employ a Krylov sub- 
space generated by the Lanczos algorithm in evaluating 
Green's functions^, 0, Q, and this is the reason why 
the recursion method can provide rapid convergence for 
metals. Thus, we propose a novel method which pos- 
sesses the advantages in two methods and overcomes the 
drawbacks. Let us assume that the basis set consists of 



nonorthogonal localized functions such as pseudo-atomic 
orbitals (PAO)|| and finite elements basis 0- Through- 
out this paper we use the PAO x as basis function to 
expand one-particle wave functions [8j- The charge den- 
sity p a (r) associated with spin component a is evaluated 
via Green's function G by 

p (,) w = E»«towft. (i) 

where i is a site index, a an orbital index, and nf a 
density matrix given by 

= / Gl]^E + K + )f(^)dE (2) 

with the Fermi function f(x) = 1/[1 + exp(x)]. Since 
only the charge density p a and the density matrix n a are 
required in the conventional DFT, we focus on the eval- 
uation of Green's functions in later discussion. The spin 
index a will hereafter be dropped for simplicity of nota- 
tion. It is noted that the DC and recursion methods pro- 
vide different ways of evaluating Green's functions from a 
local Hamiltonian H and overlap S matrices constructed 
from the local environment for each atom0,0,0| . By tak- 
ing into account the rapid convergence as observed in the 
recursion method and considering that the Krylov sub- 
space for the non-orthogonal basis functions is defined by 
{\W ), S~ l H\W ), (S^HflWo), (S-'H^lWo)}, we 
introduce a Krylov subspace U for each atom given by 

U = WXA _1 , (3) 

where W = {\W ), \Wi), |W 2 ), \W N )}, A and X are 
eigenvalues and corresponding eigenvectors of an overlap 
matrix W^SW. The W is generated by the following 
procedure: (i) \R n ) = QH\W n -x), (ii) \W£ = \R n ) - 
Em=o \W m )(W m \S\R m ), (iii) \W n ) = S-orthonormalized 
block vector of |W,'). In this procedure, Q is the in- 
verse of a local overlap matrix S constructed from the 
same truncated cluster as in the construction of the lo- 
cal Hamiltonian H, where the cluster is constructed by 
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a logically truncation method 5] . In case the large size 
of truncated cluster is required for reduction of the com- 
putational error, Q can be substituted by an approx- 
imate inverse given by Vs -1 V"f with s = V^SV and 
V = {|F ), \Vi), \V 2 ), \V N >)} wh ich is generated by the 
following procedure: (I) \Y m ) = S\V m -i), (II) \V^) = 
l^m)-E™=o 1 \V P )(V p \Y m ) (III) \V m ) = I-orthonormalized 
block vector of \V^ n ). The initial states \Vq) and |Wn) 
consist of block I-orthonormalized vectors and its S- 
orthonormalized vectors, and the optimum choice of |Vo) 
depends on the system as discussed later on. In the gen- 
eration scheme, we impose only the S-orthogonahty be- 
tween Krylov vectors without assuming any specific form 
for the representation of the Hamiltonian matrix, while 
a tridiagonal form of the Hamiltonian matrix is imposed 
in the Lanczos algorithm jj, I3. Although the procedure 
(i)-(iii) gives a set of S'-orthonormal Krylov vectors in 
principle, the S'-orthonormality is not well assured due to 
round-off error in the Gram-Schmidt orthogonalization. 
Therefore, the Krylov subspace U is given by an orthog- 
onal transformation Eq. (3). For numerical stability, it 
is crucial to construct the Krylov subspace U at the first 
SCF step, and to fix it during subsequent steps. If the 
Krylov subspace is regenerated at every SCF step, the 
SCF convergence becomes significantly worse because of 
fluctuation of the spanned space, which is the reason for 
the instability inherent in the recursion method coupled 
with the SCF calculation as discussed later on. By con- 
sidering a further spatial division of the truncated cluster 
into a core and the remaining buffer regions, and taking 
the Krylov subspace representation, the original general- 
ized eigenvalue problem Hc^ — e^Sc^ for the truncated 
cluster can be transformed to a standard eigenvalue prob- 
lem HH 



e M 6 M with 



U*HU 

u\H c u c 



4-HcbUb + u\Hl h u c + ulH h u b 



(4) 



where H c , H\ )1 and H cb are Hamiltonian matrices, repre- 
sented by the original basis functions \-> f° r the core and 
buffer regions, and between the core and buffer regions, 
respectively. Considering that the Krylov subspace U 
is decomposed to contributions of the core and buffer re- 
gions: TJt = (ul,ut), it is straightforward to see that H K 



is composed by a short range H* 



itH r u r and the other 



long range contributions i?| K . Since the required buffer 
size to satisfy the milli-Hartree accuracy can be large in 
most cases for metals, therefore, once the long range con- 
tributions iJj K is calculated at the first SCF step, the ma- 
trix is fixed during subsequent steps, while it it possible 
to update Hg after achieving the self consistency. Then, 
the standard eigenvalue problem is diagonalized with a 
updated Hf- and the fixed Hf 1 during subsequent steps, 
which means that the detailed short range contribution 
to the electronic structure can be taken into account with 
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FIG. 1: (a) absolute error, with respect to the band calcula- 
tion, in the total energy (Hartree/atom) for bcc lithium bulk 
as a function of atoms in each truncated cluster calculated by 
three linear scaling methods, (b) computational time (s) for 
the diagonalization part per MD step. The number of atoms 
in the core region is 113. The set of numbers in the parenthe- 
sis means the percentage of the dimension of the subspaces 
W and V relative to the total number of basis functions in 
the truncated cluster, respectively. 



an effective correction by the long range contribution ifj. 
Thus, the evaluation of Green's functions is mapped to a 
cluster problem analogous to the DC methodpj but with 
the effective smaller Hamiltonian H K . The core region 
in this study is defined by a cluster within atoms having 
non zero overlap XiaXjp to the central atom i. In this 
case the required components in the eigenvectors for the 
evaluation of the charge density Eq. (1) is easily eval- 



uated by a back transform Cu 



It is noted that 



the evaluation of Green's function and the density matrix 
Eq. (2) is trivial in the same way as the DC method^ 
since we have the eigenvalues e M and its corresponding 
eigenvectors c M . 

In Fig. 1 the absolute error in the total energy and 
the computational time calculated by three linear scaling 
methods, the proposed, DC, and recursion methods, are 
shown as a function of number of atoms in each truncated 
cluster for bcc lithium bulk. All the calculations in this 
study were performed by a DFT code, QpenMX lsl llfl . 
with a generalized gradient approximation (GGA) 11] to 
the exchange-correlation potential. It is found that three 
methods are equivalent in terms of the accuracy. How- 
ever, we see that the computational time of the proposed 
method is remarkably reduced compared to those of the 
DC and recursion methods. In the proposed method the 
dimension of the Krylov subspace W and that of the sub- 
space V for the approximate inverse of the overlap matrix 
are 7 % and 28 % of the total number of basis functions in 
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FIG. 2: The residual norm of charge density as a function of 
SCF steps calculated by the band method, the proposed, its 
variant with the regeneration of the Krylov subspace, DC, and 
recursion methods for fee aluminum bulk. In the proposed 
method, the core and buffer regions contain 55 and 326 atoms, 
respectively. For the set of numbers in the parenthesis, see 
the caption of Fig. 1. 



the truncated cluster, respectively. In spite of the consid- 
erable reduction of the spanned space, the method gives 
the same result as that of the DC method, which clearly 
shows rapid convergence of the proposed method based 
on the Krylov subspace. The difference between the pro- 
posed and recursion methods in the computational time 
is attributed to the regeneration of the Krylov subspace 
and the evaluation of Eq. (2) in the recursion method. 

To compare the numerical stability, the SCF conver- 
gence is shown in Fig. 2 for the conventional band and 
four linear scaling methods for fee aluminum. The resid- 
ual norm of charge density by the band, proposed, and 
DC methods quickly decreases, while the convergent re- 
sult is hardly obtained in the proposed method with the 
regeneration of the Krylov subspace and the recursion 
method. The comparison between the proposed method 
and its variant with the regeneration of the Krylov sub- 
space suggests a reason why the recursion method tends 
to suffer from the numerical instability. The regenera- 
tion of the Krylov subspace makes the spanned subspace 
fluctuate, which means that an eigenvalue problem de- 
fined by a different subspace is solved at every SCF step. 
This fluctuation of the spanned space causes the diffi- 
culty in obtaining the SCF convergence for the recursion 
method. On the other hand, in the proposed method 
without the regeneration of the Krylov subspace the dif- 
ficulty is avoided since the spanned space is fixed during 
the SCF calculation. 

In Fig. 3 the absolute error in the total energy calcu- 
lated by the proposed and DC methods are shown for a 
wide variety of materials. Several trends in the conver- 
gence properties can be found in this comparison. It is 
obvious that the large truncated cluster is required to sat- 
isfy the milli-Hartree accuracy for simple metals such as 
aluminum and lithium (see also Fig. 1). However, a rela- 
tively smaller dimension of the Krylov subspace is enough 
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FIG. 3: (a) absolute error, with respect to the band calcu- 
lations, in the total energy (Hartree/atom) calculated by the 
proposed and DC methods for metals and finite gap systems, 
(b) computational time (s/atom/MD). For a substantial com- 
parison, the calculations were performed using a single Xeon 
processor. The set of numbers in the parenthesis of (a) means 
the average number of atoms in the core and buffer regions. 
For that in (b), see the caption of Fig. 1. 



for the convergence. For the B32-LiAl alloy and the tran- 
sition metal Fe, the relative dimension of the Krylov sub- 
space required for the convergence increases compared to 
that for the simple metals. For systems with a finite gap, 
the total energy converges to the milli-Hartree accuracy 
even in a small truncated cluster especially for DNA with 
a periodic double helix structure (650 atoms/unit) con- 
sisting of cytosines and guanines, while the dimension of 
the Krylov subspace for the convergence is comparable to 
that of the original space defined by the truncated clus- 
ter. Therefore, in comparison with the DC method, the 
proposed method is more efficient especially for metal- 
lic systems, and that the efficiency becomes comparable 
as the covalency and ionicity in the electronic structure 
increase. The crossing point between the proposed and 
conventional methods in the computational time is es- 
timated to be about 800 atoms for silicon bulk on the 
serial computation, while it varies depending on the sys- 
tem and calculation conditions. It is also interesting that 
the convergence rate with respect to the Krylov subspace 
dimension depends on the choice of \Vq). For metals and 
insulators, we find that an optimum choice for \Vq) is a 
set of the basis functions in the central atom and the 
central atom plus the neighboring atoms, respectively, 
which may be related to different convergence properties 
of constituents such as itinerant, a, and tt electrons in 
the electronic structures |5|. 

As an illustration of the proposed method, we present 
a large scale calculation for a truncated octahedral palla- 
dium cluster with a single iron impurity, FeiPdsoellSj. I n 
the linear scaling calculation the truncated cluster is con- 
structed by atoms within a sphere with a radius of 9.2 A, 
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FIG. 4: Magnetic moment of Pd atoms in a cluster, FeiPdso6 , 
with a central iron impurity as a function of the layer number 
along [100]- and [lll]-axes. The total magnetic moment of 
the cluster is 344 ((j,b). The calculation was performed using 
six nodes (16 CPUs/node) of a SR11K machine. 



and for example in fee Pd the core and buffer regions con- 
tain 55 and 170 atoms under this condition, respectively. 
The dimension of the Krylov subspace W is 600, and 
the value corresponds to about 30 % of the total num- 
ber of basis functions in the truncated cluster for fee Pd. 
The exact inverse of the local overlap matrix is used for 
Q. The calculation condition gives 0.00217 and 0.00012 
(Hartree/atom) as the absolute error in the total energy 
for fee Pd and bee Fe, respectively, compared to the con- 
ventional band calculations. In Fig. 4 the magnetic mo- 
ment of Pd atoms in the cluster FeiPdso6 is shown as a 
function of the layer number along [100]- and [lll]-axes, 
where in each layer the value of the nearest atom to the 
axes is considered. It is found that a peak appears around 
the second or third layer along both the axes, and that 
the surface magnetic moment is smaller than that of the 
inner Pd atoms. The magnetic moment of the central 
iron atom is 3.68 (/xb) which is well compared to other 
theoretical and experimental values, 3.47^iJ and ~ 4[l5j 
(/xb), in FePd alloy. The spatial distribution of magnetic 
moment may be attributed to interactions between the 
inner and surface magnetic moments. The details will be 
discussed elsewhere. 

In summary, an efficient and robust linear scaling 
method has been developed for a wide variety of ma- 
terials including metals. Based on the Krylov subspace 
an embedded cluster problem is solved with an effective 
Hamiltonian consisting of the detailed short range and 
the effective long range contributions. The method is 
regarded as a unified approach connecting the DC and 
recursion methods, and enables us to obtain convergent 
results with the milli-Hartree accuracy for a wide variety 
of materials. The application to a palladium cluster with 
a single iron impurity clearly shows that the method is 
a promising approach for realization of linear scaling ab 
initio calculations for metals. 

The author would like to thank Prof. K. Terakura for 
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